skip to main content


Search for: All records

Creators/Authors contains: "Elofsson, ed., Arne"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Summary

    pyCapsid is a Python package developed to facilitate the characterization of the dynamics and quasi-rigid mechanical units of protein shells and other protein complexes. The package was developed in response to the rapid increase of high-resolution structures, particularly capsids of viruses, requiring multiscale biophysical analyses. Given a protein shell, pyCapsid generates the collective vibrations of its amino-acid residues, identifies quasi-rigid mechanical regions associated with the disassembly of the structure, and maps the results back to the input proteins for interpretation. pyCapsid summarizes the main results in a report that includes publication-quality figures.

    Availability and implementation

    pyCapsid’s source code is available under MIT License on GitHub. It is compatible with Python 3.8–3.10 and has been deployed in two leading Python package-management systems, PIP and Conda. Installation instructions and tutorials are available in the online documentation and in the pyCapsid’s YouTube playlist. In addition, a cloud-based implementation of pyCapsid is available as a Google Colab notebook. pyCapsid Colab does not require installation and generates the same report and outputs as the installable version. Users can post issues regarding pyCapsid in the repository’s issues section.

     
    more » « less
  2. Abstract Motivation

    High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information of distances and angles along the protein backbone into a linear string of the same length as the protein string, using tokens from a 21-letter discretized structural alphabet (3Di).

    Results

    We show that using both the amino acid sequence and the 3Di sequence generated by Foldseek as inputs to our recent deep-learning method, Topsy-Turvy, substantially improves the performance of predicting protein–protein interactions cross-species. Thus TT3D (Topsy-Turvy 3D) presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, while being sufficiently lightweight so that high-quality binary protein–protein interaction predictions across all protein pairs can be made genome-wide.

    Availability and Implementation

    TT3D is available at https://github.com/samsledje/D-SCRIPT. An archived version of the code at time of submission can be found at https://zenodo.org/records/10037674.

     
    more » « less
  3. Abstract Motivation

    Tandem mass spectrometry is an essential technology for characterizing chemical compounds at high sensitivity and throughput, and is commonly adopted in many fields. However, computational methods for automated compound identification from their MS/MS spectra are still limited, especially for novel compounds that have not been previously characterized. In recent years, in silico methods were proposed to predict the MS/MS spectra of compounds, which can then be used to expand the reference spectral libraries for compound identification. However, these methods did not consider the compounds’ 3D conformations, and thus neglected critical structural information.

    Results

    We present the 3D Molecular Network for Mass Spectra Prediction (3DMolMS), a deep neural network model to predict the MS/MS spectra of compounds from their 3D conformations. We evaluated the model on the experimental spectra collected in several spectral libraries. The results showed that 3DMolMS predicted the spectra with the average cosine similarity of 0.691 and 0.478 with the experimental MS/MS spectra acquired in positive and negative ion modes, respectively. Furthermore, 3DMolMS model can be generalized to the prediction of MS/MS spectra acquired by different labs on different instruments through minor fine-tuning on a small set of spectra. Finally, we demonstrate that the molecular representation learned by 3DMolMS from MS/MS spectra prediction can be adapted to enhance the prediction of chemical properties such as the elution time in the liquid chromatography and the collisional cross section measured by ion mobility spectrometry, both of which are often used to improve compound identification.

    Availability and implementation

    The codes of 3DMolMS are available at https://github.com/JosieHong/3DMolMS and the web service is at https://spectrumprediction.gnps2.org.

     
    more » « less
  4. Abstract Motivation

    Exciting new opportunities have arisen to solve the protein contact prediction problem from the progress in neural networks and the availability of a large number of homologous sequences through high-throughput sequencing. In this work, we study how deep convolutional neural networks (ConvNets) may be best designed and developed to solve this long-standing problem.

    Results

    With publicly available datasets, we designed and trained various ConvNet architectures. We tested several recent deep learning techniques including wide residual networks, dropouts and dilated convolutions. We studied the improvements in the precision of medium-range and long-range contacts, and compared the performance of our best architectures with the ones used in existing state-of-the-art methods. The proposed ConvNet architectures predict contacts with significantly more precision than the architectures used in several state-of-the-art methods. When trained using the DeepCov dataset consisting of 3456 proteins and tested on PSICOV dataset of 150 proteins, our architectures achieve up to 15% higher precision when L/2 long-range contacts are evaluated. Similarly, when trained using the DNCON2 dataset consisting of 1426 proteins and tested on 84 protein domains in the CASP12 dataset, our single network achieves 4.8% higher precision than the ensembled DNCON2 method when top L long-range contacts are evaluated.

    Availability and implementation

    DEEPCON is available at https://github.com/badriadhikari/DEEPCON/.

     
    more » « less
  5. Abstract Motivation

    The accuracy and success rate of de novo protein design remain limited, mainly due to the parameter over-fitting of current energy functions and their inability to discriminate incorrect designs from correct designs.

    Results

    We developed an extended energy function, EvoEF2, for efficient de novo protein sequence design, based on a previously proposed physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3% of all, core and surface residues for 148 test monomers, and was generally applicable to protein–protein interaction design, as it recapitulated 30.9%, 42.4%, 31.3% and 21.4% of all, core, interface and surface residues for 88 test dimers, significantly outperforming EvoEF on the native sequence recapitulation. We further used I-TASSER to evaluate the foldability of the 148 designed monomer sequences, where all of them were predicted to fold into structures with high fold- and atomic-level similarity to their corresponding native structures, as demonstrated by the fact that 87.8% of the predicted structures shared a root-mean-square-deviation less than 2 Å to their native counterparts. The study also demonstrated that the usefulness of physical energy functions is highly correlated with the parameter optimization processes, and EvoEF2, with parameters optimized using sequence recapitulation, is more suitable for computational protein sequence design than EvoEF, which was optimized on thermodynamic mutation data.

    Availability and implementation

    The source code of EvoEF2 and the benchmark datasets are freely available at https://zhanglab.ccmb.med.umich.edu/EvoEF.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  6. Abstract Motivation

    Glycan microarrays are capable of illuminating the interactions of glycan-binding proteins (GBPs) against hundreds of defined glycan structures, and have revolutionized the investigations of protein–carbohydrate interactions underlying numerous critical biological activities. However, it is difficult to interpret microarray data and identify structural determinants promoting glycan binding to glycan-binding proteins due to the ambiguity in microarray fluorescence intensity and complexity in branched glycan structures. To facilitate analysis of glycan microarray data alongside protein structure, we have built the Glycan Microarray Database (GlyMDB), a web-based resource including a searchable database of glycan microarray samples and a toolset for data/structure analysis.

    Results

    The current GlyMDB provides data visualization and glycan-binding motif discovery for 5203 glycan microarray samples collected from the Consortium for Functional Glycomics. The unique feature of GlyMDB is to link microarray data to PDB structures. The GlyMDB provides different options for database query, and allows users to upload their microarray data for analysis. After search or upload is complete, users can choose the criterion for binder versus non-binder classification. They can view the signal intensity graph including the binder/non-binder threshold followed by a list of glycan-binding motifs. One can also compare the fluorescence intensity data from two different microarray samples. A protein sequence-based search is performed using BLAST to match microarray data with all available PDB structures containing glycans. The glycan ligand information is displayed, and links are provided for structural visualization and redirection to other modules in GlycanStructure.ORG for further investigation of glycan-binding sites and glycan structures.

    Availability and implementation

    http://www.glycanstructure.org/glymdb.

    Contact

    wonpil@lehigh.edu

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  7. Abstract Motivation

    Proteases are enzymes that cleave target substrate proteins by catalyzing the hydrolysis of peptide bonds between specific amino acids. While the functional proteolysis regulated by proteases plays a central role in the ‘life and death’ cellular processes, many of the corresponding substrates and their cleavage sites were not found yet. Availability of accurate predictors of the substrates and cleavage sites would facilitate understanding of proteases’ functions and physiological roles. Deep learning is a promising approach for the development of accurate predictors of substrate cleavage events.

    Results

    We propose DeepCleave, the first deep learning-based predictor of protease-specific substrates and cleavage sites. DeepCleave uses protein substrate sequence data as input and employs convolutional neural networks with transfer learning to train accurate predictive models. High predictive performance of our models stems from the use of high-quality cleavage site features extracted from the substrate sequences through the deep learning process, and the application of transfer learning, multiple kernels and attention layer in the design of the deep network. Empirical tests against several related state-of-the-art methods demonstrate that DeepCleave outperforms these methods in predicting caspase and matrix metalloprotease substrate-cleavage sites.

    Availability and implementation

    The DeepCleave webserver and source code are freely available at http://deepcleave.erc.monash.edu/.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less